The keygram in this tab shows 5 rows that are noticeably darker than other rows: G major, C major, F major, C minor and F minor. Out of these 5, C major is the darkest, which means that the model estimates that the song is in C major.
Below is a chroma-based self-similarity matrix of the second song I have submitted. The relatively dark square starting at about 65 seconds and spanning across the next 25 seconds or so represents the middle part of the track, signaling a relatively “stable” repeated part. After about 100 seconds, the track becomes more “chaotic” with most parts not resembling any other part of the track. The thick green line at the end/top tells us that the final 15 seconds of the track are completely different from any other parts before it, but the very dark square at the top right of the graph illustrates that this ending part is internally very consistent chroma-wise. In other words, the final few seconds of the track all sound very similar to eachother.
Using Stable Audio, after generating 4 songs using my own prompts yielded unconvincing results, I decided to stick to prompt libraries offered. Both songs were generated using the Stable Audio AudioSparx 2.0 model with a desired duration of 3 minutes. To generate the first track, I used the prompt “Lofi hip hop beat, chillhop” from the Chillhop prompt library. To generate the second track, I used all of the promps from the Epic Rock prompt library, resulting in the final prompt “Post-Rock, Guitars, Drum Kit, Bass, Strings, Euphoric, Up-Lifting, Moody, Flowing, Raw, Epic, Sentimental, 125 BPM”